364 research outputs found
Distributed learning of CNNs on heterogeneous CPU/GPU architectures
Convolutional Neural Networks (CNNs) have shown to be powerful classification
tools in tasks that range from check reading to medical diagnosis, reaching
close to human perception, and in some cases surpassing it. However, the
problems to solve are becoming larger and more complex, which translates to
larger CNNs, leading to longer training times that not even the adoption of
Graphics Processing Units (GPUs) could keep up to. This problem is partially
solved by using more processing units and distributed training methods that are
offered by several frameworks dedicated to neural network training. However,
these techniques do not take full advantage of the possible parallelization
offered by CNNs and the cooperative use of heterogeneous devices with different
processing capabilities, clock speeds, memory size, among others. This paper
presents a new method for the parallel training of CNNs that can be considered
as a particular instantiation of model parallelism, where only the
convolutional layer is distributed. In fact, the convolutions processed during
training (forward and backward propagation included) represent from -\%
of global processing time. The paper analyzes the influence of network size,
bandwidth, batch size, number of devices, including their processing
capabilities, and other parameters. Results show that this technique is capable
of diminishing the training time without affecting the classification
performance for both CPUs and GPUs. For the CIFAR-10 dataset, using a CNN with
two convolutional layers, and and kernels, respectively, best
speedups achieve using four CPUs and with three GPUs.
Modern imaging datasets, larger and more complex than CIFAR-10 will certainly
require more than -\% of processing time calculating convolutions, and
speedups will tend to increase accordingly
A logic for n-dimensional hierarchical refinement
Hierarchical transition systems provide a popular mathematical structure to
represent state-based software applications in which different layers of
abstraction are represented by inter-related state machines. The decomposition
of high level states into inner sub-states, and of their transitions into inner
sub-transitions is common refinement procedure adopted in a number of
specification formalisms.
This paper introduces a hybrid modal logic for k-layered transition systems,
its first-order standard translation, a notion of bisimulation, and a modal
invariance result. Layered and hierarchical notions of refinement are also
discussed in this setting.Comment: In Proceedings Refine'15, arXiv:1606.0134
Distribution-Based Categorization of Classifier Transfer Learning
Transfer Learning (TL) aims to transfer knowledge acquired in one problem,
the source problem, onto another problem, the target problem, dispensing with
the bottom-up construction of the target model. Due to its relevance, TL has
gained significant interest in the Machine Learning community since it paves
the way to devise intelligent learning models that can easily be tailored to
many different applications. As it is natural in a fast evolving area, a wide
variety of TL methods, settings and nomenclature have been proposed so far.
However, a wide range of works have been reporting different names for the same
concepts. This concept and terminology mixture contribute however to obscure
the TL field, hindering its proper consideration. In this paper we present a
review of the literature on the majority of classification TL methods, and also
a distribution-based categorization of TL with a common nomenclature suitable
to classification problems. Under this perspective three main TL categories are
presented, discussed and illustrated with examples
Improving Makespan in Dynamic Task Scheculing for Cloud Robotic Systems with Time Window Constraints
A scheduling method in a robotic network cloud system with minimal makespan
is beneficial as the system can complete all the tasks assigned to it in the
fastest way. Robotic network cloud systems can be translated into graphs where
nodes represent hardware with independent computing power and edges represent
data transmissions between nodes. Time-window constraints on tasks are a
natural way to order tasks. The makespan is the maximum amount of time between
when a node starts executing its first scheduled task and when all nodes have
completed their last scheduled task. Load balancing allocation and scheduling
ensures that the time between when the first node completes its scheduled tasks
and when all other nodes complete their scheduled tasks is as short as
possible. We propose a new load balancing algorithm for task allocation and
scheduling with minimal makespan. We theoretically prove the correctness of the
proposed algorithm and present simulations illustrating the obtained results.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
Contrastive Learning from Demonstrations
This paper presents a framework for learning visual representations from
unlabeled video demonstrations captured from multiple viewpoints. We show that
these representations are applicable for imitating several robotic tasks,
including pick and place. We optimize a recently proposed self-supervised
learning algorithm by applying contrastive learning to enhance task-relevant
information while suppressing irrelevant information in the feature embeddings.
We validate the proposed method on the publicly available Multi-View Pouring
and a custom Pick and Place data sets and compare it with the TCN triplet
baseline. We evaluate the learned representations using three metrics:
viewpoint alignment, stage classification and reinforcement learning, and in
all cases the results improve when compared to state-of-the-art approaches,
with the added benefit of reduced number of training iterations
Population-based JPEG Image Compression: Problem Re-Formulation
The JPEG standard is widely used in different image processing applications.
One of the main components of the JPEG standard is the quantisation table (QT)
since it plays a vital role in the image properties such as image quality and
file size. In recent years, several efforts based on population-based
metaheuristic (PBMH) algorithms have been performed to find the proper QT(s)
for a specific image, although they do not take into consideration the user's
opinion. Take an android developer as an example, who prefers a small-size
image, while the optimisation process results in a high-quality image, leading
to a huge file size. Another pitfall of the current works is a lack of
comprehensive coverage, meaning that the QT(s) can not provide all possible
combinations of file size and quality. Therefore, this paper aims to propose
three distinct contributions. First, to include the user's opinion in the
compression process, the file size of the output image can be controlled by a
user in advance. Second, to tackle the lack of comprehensive coverage, we
suggest a novel representation. Our proposed representation can not only
provide more comprehensive coverage but also find the proper value for the
quality factor for a specific image without any background knowledge. Both
changes in representation and objective function are independent of the search
strategies and can be used with any type of population-based metaheuristic
(PBMH) algorithm. Therefore, as the third contribution, we also provide a
comprehensive benchmark on 22 state-of-the-art and recently-introduced PBMH
algorithms on our new formulation of JPEG image compression. Our extensive
experiments on different benchmark images and in terms of different criteria
show that our novel formulation for JPEG image compression can work
effectively.Comment: 39 pages, this paper is submitted to the related journa
- …